# Use Dockerized ESA SNAP toolbox to create features for classification

When developing an algorithm it's useful to just work with some specific dataset that's mounted on the local machine. 
If we want to use it as a Job in Hopsworks, we need to make the script process command line arguments. 
This is because Hopsworks automatically uses [`nbconvert`](https://nbconvert.readthedocs.io/en/latest/) to turn the notebook into a regular python script. 
The script has a `RUN_AS_JOB` switch, which makes it easy to switch between these two modes.

In [54]:
import pathlib

RUN_AS_JOB = False # Make this True when using it as a Job

if RUN_AS_JOB:
    # When running as a job in Hopsworks, the script is run as a command line application
    # so it needs to accept input parameters
    import argparse

    p = argparse.ArgumentParser()
    p.add_argument('product_folder')
    args = p.parse_args()

    product_folder = args.product_folder
else:
    # When developing, it's convenient to just specify some variables directly in the notebook
    product_folder = '/eodata/Sentinel-1/SAR/GRD/2021/04/14/S1A_EW_GRDM_1SDH_20210414T064754_20210414T064858_037443_0469F8_FFAD.SAFE'

When processing the data we need to deal with a few different paths. 
First, the original data (`product_folder`) is in a read only S3 bucket on CreoDIAS. 
The way we make this folder available to the Docker container is to mount it into some folder inside the Docker container.

In [55]:
product_folder = pathlib.Path(product_folder)

scene_id = product_folder.name
folder_inside_docker = f'/data/{scene_id}'

print(f'Processing {scene_id}')
print(f'Folder outside Docker: {product_folder}')
print(f'Folder inside Docker: {folder_inside_docker}')

Processing S1A_EW_GRDM_1SDH_20210414T064754_20210414T064858_037443_0469F8_FFAD.SAFE
Folder outside Docker: /eodata/Sentinel-1/SAR/GRD/2021/04/14/S1A_EW_GRDM_1SDH_20210414T064754_20210414T064858_037443_0469F8_FFAD.SAFE
Folder inside Docker: /data/S1A_EW_GRDM_1SDH_20210414T064754_20210414T064858_037443_0469F8_FFAD.SAFE


In [71]:
from hops import jobs
from pprint import pprint

# We can now set up the Docker job to process the image. 
# We'll need to specify the Docker image, which command to run in the image, and what the command line arguments should be.

# The Docker Job needs a name - let's use the image timestamp.
# There is a limitation in Hopsworks that needs the name to be max. 
# 63 characters *after* Hopsworks has added some things automatically. 
# So it needs to be quite short.
job_name = scene_id[17:31]

# Where to place the processed files
output_path = pathlib.Path(f'/Projects/PolarUsecase/Features/{scene_id}')

args = f"""/usr/local/snap/bin/gpt Calibration -f GeoTIFF -t {scene_id}.tif  -PselectedPolarisations=HH {folder_inside_docker}/manifest.safe 2>> {output_path / f'{scene_id}_stderr.txt'} 1>> {output_path / f'{scene_id}_stdout.txt'} || true"""

job_config = {
    'type': 'dockerJobConfiguration',
    'jobType': 'DOCKER',
    'imagePath': 'mundialis/esa-snap:ubuntu',
    'command': ['/bin/bash', '-c'],
    'defaultArgs': args,
    'volumes': [
        f'{product_folder}:{folder_inside_docker}',
    ],
    'outputPath': str(output_path),
}

job_info = jobs.create_job(job_name, job_config)
print(f'Command running in Docker container: \n{args}\n\n')
jobs.start_job(job_name)
print(f'Job info: \n{pprint(job_info)}')

Command running in Docker container: 
/usr/local/snap/bin/gpt Calibration -f GeoTIFF -t S1A_EW_GRDM_1SDH_20210414T064754_20210414T064858_037443_0469F8_FFAD.SAFE.tif  -PselectedPolarisations=HH /data/S1A_EW_GRDM_1SDH_20210414T064754_20210414T064858_037443_0469F8_FFAD.SAFE/manifest.safe 2>> /Projects/PolarUsecase/Features/S1A_EW_GRDM_1SDH_20210414T064754_20210414T064858_037443_0469F8_FFAD.SAFE/S1A_EW_GRDM_1SDH_20210414T064754_20210414T064858_037443_0469F8_FFAD.SAFE_stderr.txt 1>> /Projects/PolarUsecase/Features/S1A_EW_GRDM_1SDH_20210414T064754_20210414T064858_037443_0469F8_FFAD.SAFE/S1A_EW_GRDM_1SDH_20210414T064754_20210414T064858_037443_0469F8_FFAD.SAFE_stdout.txt || true


{'config': {'appName': '20210414T06475',
            'command': ['/bin/bash', '-c'],
            'cores': 1,
            'defaultArgs': '/usr/local/snap/bin/gpt Calibration -f GeoTIFF -t '
                           'S1A_EW_GRDM_1SDH_20210414T064754_20210414T064858_037443_0469F8_FFAD.SAFE.tif  '
                     

In [69]:
# At this point, the feature images should be available under Datasets/Features/{scene_id}